In late 2019, a novel coronavirus, severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2), which cause a severe respiratory disease,
emerged in Wuhan, China. The virus spread all over the world, and
infected more than 750 million people (WHO
2020). It is reported that patients in different population shows
different trend of symptoms. For example, females are less likely to
infect SARS-CoV-2 (Steeg 2016). However,
mechanisms behind such a difference is still unknown. Lieberman et.
al. examined gene expression in response to SARS-CoV-2 infection with
shotgun RNA Sequencing to see the gene expression changes(Lieberman 2020). Test condition is 430
SARS-CoV-2 patients and control condition is 54 non-infected
individuals.
In the first assignment, RNASeq data of the paper by Lieberman(Lieberman 2020) were obtained from GEO, and its
ID is GSE152075. There are 35784 genes in this dataset, and 430 samples
from SARS-CoV-2 patients i.e. testing samples and 54 samples from
controls. The RNASeq data was mapped to HUGO gene symbols, cleaned
e.g. duplicates and low counts were removed, normalized with TMM
normalization. MDS plot indicates a good quality of the dataset as
positive and negative samples are well-separated. In the second
assignment, the normalized data is analyzed with differential gene
expression analysis, and genes are ranked. Then thresholded
over-representation analysis reveals the notably expressed gene. The
result showed a trend up-regulation of anti-viral factors and
down-regulation of genes related to ribosomes, which is mostly
consistent with the Lieberman et.al. report. In this assignment,
non-thresholded pathway analysis is performed. The result is visualized
with Cytoscape and the Enrichment map pipeline.
# Download Packages.
if (!requireNamespace("knitr", quietly = TRUE))
BiocManager::install("knitr")
# Download data from the second assignment.
qlf_output <- read.table("qlf_output.txt")
knitr::kable(head(qlf_output), format = "html", digits = 64)
| logFC | logCPM | F | PValue | FDR | |
|---|---|---|---|---|---|
| RPLP1 | -4.622509 | 7.266273 | 251.3605 | 1.487846e-46 | 2.233257e-42 |
| RPL13A | -3.476013 | 8.491672 | 227.3249 | 5.948578e-43 | 4.464408e-39 |
| RPL13 | -3.210374 | 7.769872 | 191.9931 | 1.934494e-37 | 9.678919e-34 |
| RPS18 | -2.933320 | 7.886842 | 178.3334 | 3.104769e-35 | 1.165064e-31 |
| RPS8 | -3.137795 | 7.708668 | 175.7935 | 8.070958e-35 | 2.422902e-31 |
| RPS3A | -2.641507 | 8.238098 | 156.0558 | 1.529206e-31 | 3.825563e-28 |
Gene Set Enrichment Analysis, GSEA (Mootha 2003) (Subramanian et al. 2005), version 4.3.3 is used for non-thresholded pathway analysis. As a geneset, Human_GOBP_AllPathways_noPFOCR_no_GO_ies_March_01_2024.gmt was taken from Gary Bader lab, University of Toronto (Bader 2024). Note that GSEA was run following a tutorial by Isserlin (Isserlin 2023).
A ranked list is generated based on the ranked genes set from the second assignment.
# Make a ranked list.
ranked_qlf_output <- data.frame(rownames(qlf_output), - log10(qlf_output$PValue) * sign(qlf_output$logFC))
colnames(ranked_qlf_output) <- c("Genes", "Rank")
# Sort decsending for enrichment map.
ranked_qlf_output <- ranked_qlf_output[order(ranked_qlf_output$Rank, decreasing=TRUE),]
knitr::kable(head(ranked_qlf_output), format = "html")
| Genes | Rank | |
|---|---|---|
| 7 | IFIT1 | 29.85565 |
| 9 | XAF1 | 29.29075 |
| 11 | OAS3 | 28.75220 |
| 13 | OAS2 | 28.01861 |
| 17 | IFI44L | 27.34463 |
| 24 | CXCL10 | 25.09480 |
# Save as .rnk file for GSEA.
write.table(ranked_qlf_output, file.path(getwd(), "ranked_qlf_output.rnk"), sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE)
GSEA was performed with GUI. The ranked list and the geneset above
are used. Regarding to options, the number of permutations is set to
100, there is no collapse to gene symbols as already done in the first
assignment, max/min size is 200/10, and weighted is used for enrichment
method. 7503 genes were used for the analysis, which is enough. There
were 650 and 502 gene sets had FDR less than 25% for positive/negative
phenotype respectively, and FDR threshold needs to be lower. Snapshots
showed the expected trend. Positive/negative phenotype top genesets are
dominant with up/down-regulated genes respectively.
Figure 1: Snapshot of GSEA top 20 positive phenotype genesets.
Figure 2: Snapshot of GSEA top 20 negative phenotype genesets.
Tabular result shows the correlation between immune-related pathways
and positive phenotype genesets. Note that many pathways are involved in
interferon. On the other hand, pathways related to protein synthesis are
dominant in the result for negative phenotype genesets. Figure 3: GSEA top 10
results for positive phenotype genesets.
Figure 4:
GSEA top 10 results for negative phenotype genesets.
The results of thredholded/non-thresholded analysis are very similar. For positive patients, immune pathways, especially interferon, are often seen, and negative patients results tend to have translation-related pathways. It is interesting that Lieberman states there is a reduction in transcription of some interferon-related genes(Lieberman 2020) but both analysis showed the opposite trend. This is not clearly understood, but this is possibly because of gender or ethnicity. Chen et. al. report that transcription of ACE2, which is a interferon-responsive genes, significantly increase in Asian females, moderately decrease in elderly people in all ethnic groups, and highly significant decrease in type II diabetic patients(Chen et al. 2020). There is no information about patients except for SARS-CoV-2 positive/negative in the original paper due to privacy reason. Therefore, it is impossible to analyse further here.
Figure 5: g:Profiler result of up-regulated genes from GO biological
process. Top 10 results are shown here.
Figure 6: g:Profiler
result of down-regulated genes from GO biological process. Top 10
results are shown here.
An enrichment map was generated with EnrichmentMap app in Cytoscape.
There are 133 nodes and 1485 edges. FDR q-value cutoff, p-value cutoff
and edge cutoff were set to 0.001, 0.05 and 0.375 respectively.
Figure 7: Enrichment Map for GSEA result. Red nodes are for up-regulated genes and blue nodes are for down-regulated genes.
An annotated network was generated with AutoAnnotate app in
Cytoscape. Parameters are shown in figure 8. In addidion to those
parameters, font scale is set to 33%.
Figure 8: Annotated Network for GSEA result.
Figure 9: Parameters for the Annotated Network.
The result of this report mostly supports the original paper. It is stated that anti-viral factors are up-regulated and genes involved in translation are down-regulated (Lieberman 2020). The figures above shows the trend of up-regulated immune system genes and down-regulated protein synthesis genes. This result is also consistent with the result of the second assignment. Chen et. al. report that transcription of ACE2, which is a interferon-responsive genes, significantly increase in Asian females, moderately decrease in elderly people in all ethnic groups, and highly significant decrease in type II diabetic patients(Chen et al. 2020). This supports our result, because there is a cluster of up-regulated interferon-related genes in the middle of figure 7.